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Abstract 

We study group-testing algorithms for resolving broadcast conflicts on a multiple 
access channel (MAC) and for identifying the dead sensors in a mobile ad hoc wireless 
network. In group-testing algorithms, we are asked to identify all the defective items in 
a set of items when we can test arbitrary subsets of items. In the standard group-testing 
problem, the result of a test is binary — the tested subset either contains defective items 
or not. In the more generalized versions we study in this paper, the result of each test 
is non-binary. For example, it may indicate whether the number of defective items 
contained in the tested subset is zero, one, or at least two. 

We give adaptive algorithms that are provably more efficient than previous group 
testing algorithms. We also show how our algorithms can be applied to solve conflict 
resolution on a MAC and dead sensor diagnosis. Dead sensor diagnosis poses an in- 
teresting challenge compared to MAC resolution, because dead sensors are not locally 
detectable, nor are they themselves active participants. 

1 Introduction 

Wireless communication has renewed interest in algorithms for dealing with conflicts and 
failures among collections of communicating devices. For example, when a collection of 
wireless devices compete to communicate with a particular access point, the access point 
becomes a multiple access channel (MAC), which requires a conflict-resolution method to 
allow all devices to send their packets in a timely manner. In large deployments, the need for 
conflict resolution among devices may be further complicated by their physical distribution, 
as the devices may form an ad hoc wireless network. The traditional way a base station 
communicates with devices in an ad hoc network is via broadcast-and-respond protocols [1], 
which have a simple structure: Messages are broadcast from a base station to the n sensors in 
such a network using a simple flooding algorithm (e.g., see [2]) and responses to this message 
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are aggregated back along the spanning tree that is formed by this broadcast. Because 
the flooding algorithm is topology-discovering, the spanning tree defined by the flooding 
algorithm can be different with each broadcast. This mutability property is particularly 
useful for mobile sensors, since their network adjacencies can change over time, although 
we assume they are not moving so fast that the topology of the spanning tree defined by 
a broadcast changes before the aggregate response from the broadcast is received back at 
the base station. A new challenge arises in this context, however, when devices fail (e.g., by 
running down their batteries) and we wish to efficiently determine the identities of the dead 
sensors. 

1.1 Group Testing 

In this paper, we present and analyze new algorithms for group testing to solve conflict 
resolution in MACs and dead sensor diagnosis. In the group testing problem, we are given a 
set of n items, d of which are defective (bounds on the value of d may or may not be known, 
depending on the context). A configuration specifies which of the items are defective. Thus, 
there are configurations of d defectives among the n items. To determine which of the 
n items are defective, we are allowed to sample from the items so as to define arbitrary 
subsets that can be tested for contamination. In the standard group testing problem, each 
test returns one of two values — either the subset contains no defectives or it contains at least 
one defective. Therefore, there is an information theoretic lower bound of Ig ^ d \g n 
tests, in the worst case, for any binary-result group testing algorithm. 

Motivated by the applications mentioned above, we consider generalizations of the stan- 
dard group testing problem, where there can be three or more possible results of a contam- 
ination test. In ternary-result group testing, a result indicates whether the subset contains 
no defectives, one defective, or at least two defectives (i.e., the results are 0, 1, or 2+). Gen- 
eralizing further, we may allow for counting tests that return the exact number of defective 
items present in the test. In either case, a one-defective result may either be identifying, 
returning a unique identifier of the defective item, or anonymous, indicating, but not iden- 
tifying, that there is one defective item in the test. We are interested in the efficiency of 
generalized group testing. 

1.2 Multiple Access Channels 

In the multiple access channel problem [3], HI O [6] a set of n devices share a communication 
channel such that a subset D of of the devices wish to use the channel to transmit a data 
packet. In any time slice, some subset T of the devices may attempt a transmission on the 
channel. If there is only one device x from D in T, then it succeeds (and all parties learn the 
identity of x). Alternatively, if no device attempts to transmit, then all parties learn this 
as well. But if two or more devices attempt to transmit, then all parties learn only that a 
conflict has occurred (and no transmission is successful during this time slice). 

Each device independently decides whether to attempt to send a message based on what 
it has observed and, if an attempt is to be made, the decision to send is made by flipping 
a biased coin with probability p. We can reduce MAC conflict resolution to a group testing 
problem. 
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For large n, this scenario can be approximated by using identifying ternary tests on a 
size-pn random subset of a set of n items, d of which are defective. In the MAC situation, 
the probabihty that exactly i devices will transmit is 



A conflict arises when two or more devices transmit. 

In the testing situation, the probability that exactly i of the d defective items are within 
the randomly selected subset of size pn is 



The subset is impure when two or more defective items are in that subset. 

1.3 Dead Sensor Diagnosis 

In dead sensor diagnosis, there is an ad hoc network of n sensors, which can communicate 
with a base station using a broadcast-and-respond protocol along a broadcast tree that may 
be different with each broadcast. Furthermore, d of the sensors have failed (e.g., d batteries 
may have died, but we may not know the value of d), and we wish to identify which sensors 
are dead. This problem is complicated by the dynamic nature of the mobile sensors, since 
there is no local way to detect dead sensors — they simply become invisible to the sensors 
around them and there is no local way to distinguish this bad event from the common event 
of a live sensor moving out of range of the set of its former neighbors. 

Of course, the group controller could send out n broadcasts, each of which asks an 
individual sensor to send a "heartbeat" acknowledgment message back as a response. As- 
suming a reasonable time-out condition for non-responding sensors, this naive solution to 
the dead sensor diagnosis problem could identify the dead sensors using a total of O(n^) 
messages spread across n communication rounds, which is inefficient. (It would violate the 
broadcast-and-respond model to have the sensors respond individually to a single "who's 
alive" broadcast, since the responses would not be aggregated and would require an ex- 
pected number of 0{n^'^) messages for a planar sensor network, would require sensors close 
to the base station do proportionally more work (hence, running down their batteries faster), 
and it would still have a delay of 0{n) communication rounds at the base station.) We are 
interested in this paper in efficient solutions to the dead sensor diagnosis problem that fit 
the broadcast-and-respond model. 

1.4 Previous Related Work 

For group testing, there is a tremendous amount of previous work on the standard (binary) 
version of the problem (e.g., see [3 El [101 [HI [131 111 [IS [16]), and there has been 
some work on anonymous generalized group testing algorithms (e.g., see [HI UHl US]), some 
of which require maintenance of large tables. The standard group testing problem has been 
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applied to several other problems, including testing DNA clone libraries [20], testing blood 
samples for diseases, data forensics [H], and cryptography [22] . 

The best previous algorithms for the standard group testing problem are adaptive, as 
are our algorithms. That is, tests are performed one at a time, with the processing of a 
single step usually requiring a parallel invocation across test elements, such that the results 
from previous tests allowed to be used to guide future tests. When the exact number, 
(i, of defective items is known, Hwang's generalized binary splitting algorithm [TT] for the 
standard group testing problem exceeds the information theoretic lower bound by at most 
d — 1. This algorithm is basically a set of d parallel binary searches, which start out together 
and eventually are split off. When d is not known but is an upper bound on the number 
of defective items, at most one additional test is required [21] . AUeman [7] gives a split- 
and-overlap algorithm for the standard group testing problem that exceeds the information 
theoretic lower bound on the number of tests by less than 0.255d + ^\gd + 5.5 for d < n/2. 
The 0.255 is replaced with 0.187 when d < n/38. When no constraint on the number, d, of 
defectives is known in advance, Schlaghoff and Triesch [16] give algorithms that require 1.5 
times as many tests as the information theoretic lower bound for d defective out of n items. 

Work on multiple access channels (MACs) dates back to before the invention of the 
Ethernet protocol, and there has been a fair amount of theoretical work on this problem 
as well (e.g., see [31 HI [5l [6] ) . (Using our terminology, a MAC algorithm is equivalent to a 
ternary- result group testing algorithm with identifying results in the 1-result case.) There 
is a simple halfway-split binary tree algorithm that achieves an expected 2.885d number of 
steps (e.g., see [3]), which correspond to group tests in our terminology, to send d packets. 
This algorithm was improved by Hofri [6J , using a biased splitting strategy (which we review 
below) to achieve an expected 2.623(i steps. The best MAC algorithm we are familiar with 
is due to Greenberg and Ladner [1], who claim that their algorithm uses 2.32d expected 
number of steps, assuming d is known in advance. Interestingly, in the lower-bound paper 
of Greenberg and Winograd [5], the Greenberg-Ladner paper [1] is referenced as achieving 
2.14c? expected tests and, indeed, our analysis confirms this better bound for their algorithm, 
if d is known. Greenberg and Ladner ^ also present an algorithm for estimating d if it is not 
known in advance and, by our analysis, using this approximation algorithm with their MAC 
algorithm achieves 2.25d + 0{logd) expected number of steps (which is also better than the 
bound claimed in [Ij). 

Normally, such concern over small improvements in the constant factor for a leading term 
of a complexity bound would be of little interest. In this case, however, the reciprocal of the 
constant factor for this leading term corresponds to the throughput of the MAC; hence, even 
small improvements can yield dramatic improvements in achievable bandwidth. Moreover, 
with the expanding deployment of wireless access points, there is a new motivation for MAC 
algorithms, particularly for environments where there are many wireless devices per access 
point. We are not familiar with any MAC algorithms that achieve our degree of efficiency 
without making additional probabilistic assumptions about the nature of packet traffic (e.g., 
see [ailElE]). 

We believe the dead sensor diagnosis problem is new, but there is considerable previous 
work on device fault diagnosis for the case in which devices can test each other and label the 
other device as "good" or "faulty," if the group controller can dictate the network's topology. 
For example. Yuan et al. ^23j describe an aggregation protocol that assumes that sensors can 
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detect when neighbors are faulty. 
1.5 Our Results 

In this paper, we present algorithms for generalized group testing when the result of each 
test may be non-binary. 

Ternary-result group testing can be applied to multiple access channels. We provide 
new MAC conflict-resolution algorithms that achieve an expected 2.05Ad steps if d is known 
and 2.08d + 0{logd) tests if d is not known. Both of these bounds improve the previous 
constant factors for MAC algorithms and are based on the use of a new deferral technique 
that demonstrates the power of procrastination in the context of MAC algorithms. We also 
show that our MAC algorithm uses 0{d) steps with high probability, even if we reduce the 
randomness used, and we provide an improved algorithm for estimating the value of d if it 
is not known in advance. 

Our group testing algorithms can be applied to dead sensor diagnosis, where the items are 
sensors and the defective items are the dead sensors. Our algorithms also are concise, which 
implies that each test can be formulated as a constant-size broadcast query from the base 
station such that the aggregated response to such a query can provide the possible results 
needed for ternary-result and counting group testing. This immediately implies efficient 
algorithms for the dead sensor diagnosis problem based on our ternary-result group testing 
algorithms. We also provide a novel counting-based group testing algorithm that uses an 
expected 1.89d tests to identify the d defective items. In addition, we give new deterministic 
ternary-result group-testing algorithms using 0{dlgn) broadcast rounds (which would use 
a total of 0{dnlogn) messages for dead sensor diagnosis), with constant factors below the 
lower bound for binary-result group testing. 

2 Motivation and Definitions 

Wc have already discussed how collision resolution for a multiple access channel corresponds 
to ternary-result (0/1/2-I-) group testing, with identifying tests in the 1-result case. In this 
section, we discuss further motivation for our other generalizations of group testing and we 
give some needed definitions as well. 

2.1 Some Definitions for Group Testing 

Recall that in the group testing problem we are given a set 5* of n items, d of which are 
defective. We are allowed to form an arbitrary subset, T <Z S, and perform a group test on 
T which, in the case of ternary-result group testing, has a ternary outcome. We say that T 
is pure if T contains no defective items, tainted if it contains exactly one defective item, and 
impure if it contains at least two defective items. 

Furthermore, as mentioned above, in the case when T is tainted, we distinguish two 
possible variations in the way the test result is conveyed to us. We say that the result is 
identifying if it reveals the specific item, x & T, that is defective. Otherwise, we say that 
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the result is anonymous if it states that T is tainted but does not identify the specific item 
X in T that is defective. 

Finally, we say that a testing scheme is concise if each test subset T C that might 
be formed by this scheme can be defined with an 0(l)-sized expression E that allows us 
to determine, for any item x & S, whether x is in T in 0(1) time using information only 
contained in E and x (that is, we allow for a limited amount of memory to be associated 
with X itself). For example, a test T might be defined by a simple regular expression, 
101*10**011, for the binary representation of the name of each x in T (we assume that item 
names are unique). The applications of MAC conflict resolution and dead sensor diagnosis 
both require that the corresponding testing scheme be concise. Incidentally, most MAC 
algorithms (e.g., see [3l HJ [5l [6]) also require that all devices have access to independent 
random bits, but we show that this requirement is not strictly necessary. 

2.2 Group Testing for Dead Sensor Diagnosis 

In this subsection, we present some simple reductions of the dead sensor diagnosis problem to 
generalized group testing. Our reductions fit the broadcast-and-respond paradigm of sensor 
communication, where the base station issues a broadcast and receives back an aggregated 
response, which is the result of an associative function apphed to the sensor responses, and 
which is computed by the sensors routing the combined response back to the base station. 

Although the sensors may be mobile, we assume that they are stable enough to support 
the broadcast-and-respond paradigm in a coarse-grain synchronous fashion, so that a message 
can be broadcast to all the active sensors and an aggregated response can come back in the 
same broadcast tree. From the standpoint of an individual sensor, this implies that it 
can receive a message from a neighbor, acknowledge that receipt, and rebroadcast to its 
neighbors (with similar receipts) as a coarse-grain atomic action. This assumption allows 
us pragmatically to be able to implement the broadcast-and-respond protocol. That is, by 
a simple induction it implies that a sensor at height h in the broadcast tree need only wait 
for h coarse-grain steps before it will have received all the aggregated responses from its 
children, which allows it to then send its aggregated response to its parent. 

Given a concise ternary-result group testing algorithm. A, we can use A to perform 
dead sensor diagnosis by simulating each step of A with a broadcast and response. Because 
A is concise, each test in A can be defined by a const ant- sized expression E that is then 
broadcast to each live sensor. Moreover, each live sensor x can determine in 0(1) time 
whether it belongs to the set T defined by E and can participate in an aggregate response 
back to the base station. Thus, the remaining detail is to define possible aggregate responses 
that support useful responses, with either identifying or anonymous results in the tainted 
cases: 

• Count. We aggregate a simple count of the live sensors in T. Each live sensor x can 
determine if it belongs to T in 0(1) time, since the broadcast is concise. Likewise, 
each sensor y routing an answer back to the base station need only sum the counts 
it receives from downstream routers (plus 1 if y is in T). This aggregation function 
supports ternary responses, since the base station knows \T\ and can compare this 
value with the count performed by the live sensors. The count function is associative. 
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but it does not allow for identifying the dead sensor in the tainted case. 

• Large-ID summation. Suppose that the n sensors are assigned ID numbers that are 
guaranteed to all be greater than 2n such that no ID number can be formed as the sum 
of two or more other ID numbers. Then a summation of the ID numbers of the live 
sensors in T can be used to perform a ternary-result test, which will be an identifying 
test in the case of a result indicating that T is tainted. Specifically, the difference 
between X^xeT ^ cind the returned value will either be 0, the ID of a single sensor, or a 
value that is the sum of two or more sensor IDs. Of course, this function requires that 
sensors can add integers as large as J2x&s ^■ 

Thus, we can use dead sensor diagnosis to motivate identifying ternary- result (0/1/2+) group 
testing as well as anonymous counting group testing. Of course, if we combine these two 
functions to operated on paired responses, we can implement an identifying counting group 
testing algorithm. These aggregation functions are not meant to be exhaustive. 

3 The Binary Tree Algorithm for Ternary- Result Group 
Testing 

Since it provides a starting point for our more sophisticated algorithms, we review in this 
section the binary tree algorithm for ternary-result group testing with identifying results for 
tainted tests, which was originally presented in the context of MACs [3]. That is, we consider 
the problem of identifying the defective items in a set of items when we can adaptively 
test arbitrary subsets and each test result indicates whether the number of defective items 
contained in the tested subset is zero, one, or at least two. We also provide a simplified 
analysis of its expected performance. 

The main idea of the binary tree algorithm (parameterized by p) is to partition a set 
known to be impure into two unequal-sized subsets, of sizes p and q = 1 — p of the set's 
size, and to recursively test each of these subsets. However, the algorithm takes advantage 
of one simple optimization — if the first subset in a recursive call turns out to be pure {i.e., 
having defectives), we avoid the top level testing of the second subset and go immediately 
to splitting it in two and testing the two parts. 

The original algorithm used p = 0.5 and it has been shown [6] that p ~ .4175 minimizes 
the expected number of tests. We make use of the smaller root of the equation p2 = (1 —P2)^, 
which is solved by p2 = ^ 0.38197, and of g2 = (1 - P2) ~ 0.61803. 

The binary tree algorithm begins by testing the set, S, of items. If the test indicates that 
S is pure or tainted, in which case the one defective item will have been identified, then the 
algorithm is done. Otherwise, initialize the set L of identified defective items to empty and 
proceed with subroutine Identify (S) as follows: 

1. Partition S into two subsets, A and B, where \A\ = p\S\. 

2. Test subset A. 

(a) If A is impure, then recursively invoke Identify{A). 
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(b) If A is tainted with item z, then add z to hst L. 



3. If A is pure then we know that subset B is impure, and so there is no need to test B. 
In this case, recursively invoke Identify{B). Otherwise, test subset B. 

(a) If B is impure, then recursively invoke Identify{B). 

(b) If B is tainted with item z, then add z to list L. 

When partitioning 5* into A and B, we can select A as consisting of those items whose 
ID values are ranked contiguously, 1 through p\S\. The items in A, or B, can be specified 
by giving lower and upper limits on ID values. Thus, the binary tree algorithm is concise. 

Theorem 1 W2d\gn + o{lgn) ternary tests under the identifying model suffice, in the worst 
case, to identify all defectives in a set containing n items of which d are defective, where 
w;2 = -(l/lgP2) - 0.720210. 

Proof: We analyze the performance of the binary tree algorithm with p = p2- Let Xct{n) be 
the worst case number of tests required by algorithm Identify{S) when S* is a set of n items 
of which d turn out to be defective. 

For d = 2 and d = 3, we have the following recurrence. (Note that sets with or 
1 defective items require no further testing, thus ^^(l) = 1, and that it is assumed that 



If the first term of the recurrence were to be the maximum term, then Xa{n) = 2+Xa{p2n) = 
— (2/lgp2) ^gn. If the second term of the recurrence were to be the maximum term, then 
Xdin) = l+Xrffen) = -(l/lgg2)lgn. 

We see that X2{n) = X3(n) = — (l/lgg2) Ign = 2w2dlgn ^ 1.4404 Ign. 

For d > 4, we have the following recurrence. (It is assumed that Xd{n) > Xd-i{n).) 



Consider Xd{n) = xlgn + o(lgn), and we shall solve for x. 

Assume that, for even 1 < i < d, Xi{n) = W2i\gn + o(lgn), and that, for odd 1 < i < d, 
Xiin) = W2{i - 1) Ign + o(lgn). 

Consider any d > 4. If the first term of the recurrence were to be the maximum term, 
then X > {d — l)w2 > 2.16, since d > 4. If the second term were to be the maximum 
term, then x = —2/\gp2 ~ 1.44. If the third term were to be the maximum term, then 
X = — l/lgg2 ~ 1-44. 

Thus, the first term is the maximum term and Xd{n) ~ dX2{n)/2 = W2dlgn, for even d, 
and Xd{n) ^ {d - l)X2{n)/2 = W2{d- l)lgn, for odd d. ■ 

Thus, the binary tree algorithm has good worst-case performance. It also has good 
average-case performance, as the following theorem show^. 

^ This theorem simpUfies a result of [6 and it imphes a randomized algorithm with the same performance 
if we preface the binary tree algorithm with an initial random permutation of the items. 



X^{n)>X2{n).) 




(1) 




2 + Xi{p2n) + Xd-i{q2n), for 1 < ^ < ci - 2 



(2) 
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Theorem 2 On average, when p = p2, Identify requires fewer than 2.QAd — 2 ternary tests 
to identify all defectives in a set containing n items of which d are defective, for n » d. 
Thus, the binary tree algorithm requires fewer than 2.64(i — 1 ternary tests. 

Proof: Let be the average number of tests required by algorithm Identify{S) when 5* is 
a set of n items of which d turn out to be defective and where n » d. 

A set having d defectives wiU be spht into two subsets having i and d — i defectives, 
with both subsets being subsequently tested and processed, except when a subset's test 
reveals that it has at most one defective, in which case that subset will not be subsequently 
processed. Also, when the first subset has no defectives the second subset is processed but 
not tested. The probability that the first subset has no defectives is g'^. The probability that 
the first subset has one defective is dpq'^~^. The probability that the second subset has no 
defectives is p'^. The probability that the second subset has one defective is dp'^~^q. 

For d = 2, we have the following recurrence. 

E2 = 2 - q^ + E2{p^ + q^) (3) 

This simplifies to 



E2 = = ~ 3.42705 (4) 



2 - g2 _2-q^ 
I ^ p2 _ q2 2pq 

Thus, the binary tree algorithm with p = P2 requires approximately 4.427 tests when d = 2 
For d = 3, we have the following recurrence. 



^3 = 2 + g3(^3 _ 1) + Spq^E2 + 3p\E2 + p'^E^ (5) 

This simplifies to 



^3 = --, 1 5 = 1 — ~ 5.91776 6 

\ — qA _ p6 \ — qA — pi 



Thus, the binary tree algorithm with p = P2 requires approximately 6.91776 tests when 
d = 3. 

For d > 3, we have the following recurrence, where i denotes the number of defectives in 
the first subset. 

Ed = 2 + q\Ea - 1) + dpq^-^Ea-i + YA + ^d-.)} + dp'^-'qE^., + p'^E, (7) 

Using the value of i^^i = 0, this simplies to 

= l-q^-p<i 

Starting with Eq = Ei = and iterating, this yields results as shown in Table [H 

It is observed that for large d, 1 — q"^ — p'^ ^ 1, and {E^ + E^-i) ~ 2Ed/2- Together with 

particular values of Ed for d < 1000 suggests that, for p = p2, Ed < 2.631c?. We prove here 

a slightly weaker result. 
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Table 1: Expected number of tests used by Identify 



E2 


3.427051 


Er 


16.413785 


E30 


76.926328 


E300 


787.262001 


E3 


5.917763 


Eg 


19.046426 


E40 


103.234985 


E400 


1050.349326 


Ei 


8.520000 


E9 


21.678383 


E50 


129.543603 


E500 


1313.436665 


E5 


11.147797 


Eio 


24.309752 


Eioo 


261.087360 


Egoo 


2102.698664 


Ee 


13.780589 


E20 


50.617127 


E200 


524.174671 


-£'1000 


2628.873328 



It is seen that, for all 3 < d < 330, Ed < 2.QAd — 2. We show by induction that this is 
true for all larger d. The numerator of equation ([8]) contains a weighted sum of Ei + E^-i- 
By the inductive hypothesis, that weighted sum will total at most (2.64rf - 4)(1 - (p-^ + g"^)), 
and the entire numerator will total at most 2 — q'^ more. We focus on the contributions of 
the small pieces Ej, for j = 1, 2, 3. Ej contributes {^^{p'q'^~-' +p'^^^q^)Ej, which we bounded 
using Ed < 2.64(i — 2 to be at most 

X = rf(pg'^"i+/-ig)(.64)+ f^VA''-2+/~V)(3.28)+ fjVpV"'+/"V)(^^ (9) 



But we actually get a contribution of 

Y = (^^ {p^q''-^ + (3.42705) + Q {pY'^ + P^^'g') (5.91776) (10) 

Adding Y — X and reordering the terms, we can bound the numerator as being at most 
(2.64rf - 2)(1 - (/ + g^)) + W, where 

W < 2/+g<^-.64rf/~ig-.64c/pg^-i + .1471 Q {p^'^"^ +p'^-\^)- m22(^^ {p^q'^^^ +p'^-^q^) 

(11) 

Observing that < for all 3 < c/ < 330, we show that < for all d > 330 
by demonstrating that each positive term is more than offset by a different negative term. 
Compare the positive term P = .14:7l(^p'^q'^~'^ to the negative term = — .0022 (^p'^q 

Their absolute ratio, \P/N\ = oo22p(d-2)/3 < 1' ^ ^ ^27. The positive term .147l(2)p'^-2g 
is smaller than the absolute value of the negative term —.0022(^^p'^~'^q^ for d > 126. The 
remaining positive terms, 2p'^ and q'^, are neutralized by —.QAdp'^^^q and —.QAdpq'^~^ when, 
respectively, d> 2 and d> 3. We conclude that, for p = p2, E^ < 2.QAd — 2. ■ 

Using different values of p yields different results. To minimize £'2, a value oip = \/2 — l ^ 
0.4142 is best [6], requiring 3.414 tests. To minimize E3, p ^ 0.4226 is best and requires 5.884 
tests. To minimize E4, p ^ 0.4197 is best and requires 8.482 tests, p = p* ^ 0.41750778 
is asymptotically optimal for large d. The curves are fairly fiat, so, although one could 
tune p depending on the expected distribution of the values of d, choosing p = p* is a. good 
choice for most distributions and, as noted by Hofri [6], is optimal for the naturally arising 
distribution, when the defective items are i.i.d., requiring ^ 2.6229(i tests. 



d-3 
2 
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4 The Deferral Algorithm 



In this section, we describe how to substantially improve on the average case of the binary 
tree algorithm under the assumption that we have a good approximation on the number, 
(i, of defective items. This algorithm is especially useful for the Multiple Access Channel 
problem. 

The main idea of our algorithm, which we call Deferral, begins by using an approach used 
by Greenberg and Ladner [4J where we use knowledge of the approximate number of defective 
items to randomly partition the set of items into a set, L, of buckets such that the expected 
number of defective items in each bucket is a constant. This process is called a spreading 
action and, for \L\ = sd, the parameter s is called the spread factor. Greenberg and Ladner 's 
algorithm performs a spreading action using an appropriate spread factor (they recommend 
s = 0.8), performs a test on each bucket, and then applies the binary tree algorithm to each 
bucket that has a 2+ result. 

Our approach does something similar, but augments it with a new deferral technique 
that may at first seem counter-intuitive. We also perform a spreading action, perform a 
test for each bucket, and apply the binary tree algorithm recursively to any bucket with a 
2+ result, except that we cut recursive calls short in certain cases and defer to the future 
all items whose status remains unclear from all such calls. We then recursively apply the 
entire algorithm on these deferred items. As we show in our analysis, this is a case when 
procrastination provides asymptotic improvements, for this deferral algorithm has a better 
average-case performance than does the direct do-it-now approach of Greenberg and Ladner. 

Deferral proceeds as follows. 

1. Initialize a deferral bucket to empty. 

2. For each bucket K in set L, identify some of the defective items in K (and defer others) 
as follows. 

Test K. If the test shows that K is pure or tainted, all defective items of K will 
have been identified. Otherwise, use algorithm BucketSearch on bucket K. 

3. Finally, if the deferral bucket is non-empty then recursively apply Deferral to the set 
of items in the deferral bucket. 

Algorithm BucketSearch proceeds as follows. 

1. Partition K into a first portion A having fraction p of the items in K, and a second 
portion B having the remainder fraction g = 1 — p of K^s items. 

2. Test A. One of three results will occur: 

(a) If A is pure, then recursively invoke Buckets earch{B). 

(b) If A is tainted, then the lone defective item in A will have been identified. In this 
case, test B and, only when B is impure, recursively invoke Buckets earch{B) . 

(c) If A is impure, recursively invoke Buckets earch{A) . Finally, merge B with the 
deferral bucket. 
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It might not be immediately obvious, but this algorithm can be made concise, using 0(1) 
words of memory per test element (one of which, for example, can keep the state of whether 
an item is being deferred or not). 

4.1 Analysis of the Deferral Algorithm 

Let Ps{k) be the probability of a bucket containing exactly k defective items, given that we 
are using \L\ = sd buckets, i.e., we have a spread factor of s. Then 



Ps{k) = 




For example, if we use a spread factor of s = .75, then Ps{0) < 0.2636, -Ps(l) < 0.3515, 
P,(2) < 0.2344, . . . , P,(6) < 0.0021, and Ps{7) < 0.0004. We observe that we expect that 
99.9% of all buckets contain fewer than seven defective items in this case (and this is true for 
all spread factors greater than 0.5). Furthermore, Ps(?) is monotonic decreasing for i > 2. 
Therefore, in analyzing the expected behavior of algorithms that use a spreading step with a 
reasonable spread factor, the expected number of tests is dominated by the expected number 
of tests performed on buckets with fewer than seven defective items. 



Analyzing the Expected Number of Tests per Bucket 

We begin by evaluating the expected number, Ed, of tests performed in a bucket containing 
d defectives (not counting the global test for the bucket or future deferred tests for items 
currently in the bucket). We determine Ed for small values of d. By construction, Eq = 
El = 0. For d > 1, we consider the cases x-y that arise when partitioning a set containing 
d defective items into two subsets that turn out to contain, respectively, x and y defective 
items, li d — 2, then the 2-0 case entails 1 test and a recursive call (and a deferral of a pure 
set), the 1-1 case entails 2 tests, and the 0-2 case entails 1 test and a recursive call. Thus, 
letting q — 1 — p, 

E2 = p^{E2 + l) + 2pq{2) + q\E2 + l) = p^E2 + {p^ + 2pq + q^) + 2pq + q^E2 
_ l + 2pq _ l + 2pq 

1 — p"^ — q^ 2pq 

Likewise, \l d — ?>, the 3-0 case entails 1 test and a recursive call (and a deferral of a pure 
set), the 2-1 case entails 1 test and a recursive call on a 2-defective set (and a deferral of a 
1-defective set), the 1-2 case entails 2 tests and a recursive call on a 2-defective set, and the 
0-3 case entails 1 test and a recursive call. Thus, 

^3 = P\E, + 1) + ^p\{E2 + 1) + ^pq\E2 + 2) + q\E, + 1) = ^ + + (.^l^'l ^^P'f)E2 _ 

1 — P"^ — q"^ 

Similarly, 

1 + 4pq^ + {Ap^q + 'ipq^)E^ + 6p^g^£^2 



E. = 



1 _ p4 _ ^4 
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Likewise, 

1 + 5pq^ + {bp'^q + hpq'^)Ei + lOp^g^Eg + lQp^q^E2 

-^5 = ^ J J ■ 

Moreover, 

_ 1 + 6pg5 + {<ap^q + 6pg5)E5 + Ibp'^q^E^ + 20p^q^E3 + 15p2g4£;2 
6 - 1 - p6 _ g6 ■ 

Finally (which will be sufficient for our analysis), 

1 + 7p?6 ^ (7^6^ ^ 7p^6) ^ 21p^q^E5 + SBpYE^ + SBpYEs + 21p''q^E2 

1 — p^ — q^ 

But this is only for the first round. We still need to account for the expected number of 
defective items deferred from this round to future rounds. 



Analyzing the Expected Number of Deferred Defective Items 

Let Dd denote the expected number of defective items deferred in a bucket with d defective 
items. Certainly, since we are guaranteed to find at least 2 defective items for any bucket with 
d > 2, we can bound Dd < d — 2 for d > 2. Moreover, we trivially have that Dq = Di = 0. 
We evaluate Dd for some small values of d, beginning with D^. 

When d — 3, the 3-0, 1-2, and 0-3 cases all entail recursive calls, but only the 2-1 case 
causes a defective item to be deferred. Thus, 

Ds = p^Ds + 3p^q + q^Ds= = p. 

For d — A, the 4-0 and 0-4 cases both entail recursive calls, the 3-1 case entails a 3- 
defective recursive call and 1 deferral, the 2-2 case entails 2 deferrals, and the 1-3 case 
entails a 3-defective recursive call. Thus, 



Di = p'^Di + Ap^q + (4p^g + 4pq^)D3 + 12p'^q^ + q^Di 



Ap^q + 12p2g2 {Ap^q + Apq^)Ds 
A 7a ■ 



i--P -Q 
Likewise, 

_ 5p^q + 20p^q^ + SOp'^q^ + {5p^q + ^pq'^)Di + lOp^q'^D^ 

— 1 1 ■ 

Finally (which will be sufficient for our analysis), 

6p^q + 30p^g2 + 60p\^ + GOp^gS + (gp^g + 6pq^)Dr, + Idp^^q^D^ + 2Qp^q^D^ 



D. 



1 _ p6 _ ^6 



It does not result in elegant equations, but we can nevertheless combine this analysis 
with the previous bounds on Ed and Ps{k) to derive the expected number of tests performed 
by our algorithm. For example, with a spread factor of s = 0.8 and a split parameter of 
p = 0.479, we obtain that the expected number of tests is less than 2.05Ad. 
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4.2 Estimating the Number of Defectives 

Greenberg and Ladner [1| give a simple repeated doubling algorithm for estimating the 
number of transmitting devices, ci, in a set. In their algorithm, each transmitting device 
repeatedly transmits with probability 2"*, for z = 1,2, . . ., until there is a non-collision. It 
then sets its estimate of the number of defectives as d = 2*. 

For combinatorial testing to determine the number of defective items, d, in a large set of 
items, this is equivalent to repeatedly testing a random set of size n2~*, for i = 1,2, . . ., until 
a test obtains a 0- or 1-result. Unfortunately, this simple approximation is not sufficiently 
accurate for our purposes, so we provide in this section a simple improvement of the doubling 
algorithm, which increases the accuracy of the estimate while only increasing the number of 
tests by a small additive factor. 

We begin by applying the simple doubling algorithm. This algorithm is 99.9% likely to 
use 0{logd) tests and produce an estimate, d, such that d/32 < d < 32d. However, the 
estimate is within a factor of 2 of only about 75% of the time. (It varies, approximately 
65% to 90%, depending on how close d is to a power of 2.) While this is insufficient to 
produce a useful estimate of d for the sake of computing a spread factor, it is sufficient as a 
first step for coming up with a better approximation for d. 

Let us, therefore, assume we have computed the estimate d. We next perform a sequence 
of experiments, for i = j, . . ., where experiment i involves choosing a constant number, c, 
of random subsets of size n2~*/" and performing a test for each one, where j = max{l, a(lg d — 
5)} with a a small integer such as 2 or 4. We stop the sequence of experiments as soon as 
one of the c tests returns a result of or 1. We then use the value of i to produce a refined 
estimate, d', for d. We use d' = f{a,c) ■ tI"-, where / is a normalizing function so that 
E[d'] = d. 

The probability that all c subsets for experiment i contain collisions quickly approaches 
1 — (1 — (t + l)e~*)'^ , where t = d/2^^"'. This fact can then be used to find a good estimate 
of d, based on the values of a and c. When a = 4 and c = 8, using f{a,c) = 4.3 results 
in the estimate being within a factor of 2 of d about 99.3% of the time (varying about 98% 
to 100%, depending on d). Moreover, combining this estimate algorithm with our deferred 
binary tree algorithm results in a testing algorithm that uses an expected 2.08d + 0{logd) 
tests, and which does not need to know the value of d in advance. 

4.3 Reducing the Randomness of the Deferral Algorithm 

In this subsection, we show how to reduce the randomness needed for the deferral algorithm, 
while keeping it concise. In particular, we do not need O(logn) random bits associated with 
each defective item; we can use an expected O(logn) random bits associated with a group 
controller instead. Moreover, even with this reduced randomness, we show that we will make 
only 0{d) tests, with high probability, 1 — 0{l/d). 

The main idea of our modified algorithm is to apply the Deferral algorithm, as described 
above, but use a random hash function to define the top-level partitioning to be performed. 
Indeed, the top-level distribution of our algorithm is closely related to the hashing of d out 
of n elements into a table of size 0{d), in that mapping items to cells without collisions is 
quite helpful (corresponding to identifying tests in our case). The main difference between 
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our problem and the hashing problem is that, in the case of a collision (corresponding to 
an impure test set in our case), we do not know which items or even how many items have 
collided. 

Our algorithm is the same as the deferral algorithm, except that instead of using random 
bits associated with the items, we define the top-level buckets as follows: 

1. Assign m two times the value of d, the number of remaining defective items. 

2. Choose a prime number r > m and choose a random set of integers, {ai, 02, 03}, from 
the interval [1, r — 1], and a random integer, b, from the interval [0, r — 1]. Define h as 

h{x) = (a-^x^ + a2x'^ + aix + b mod r) mod m. 

Comment: It is well known (e.g., see p5]) that such a hash function, h, comes from a 4- 
universal family of hash functions, which implies that the probability that h maps 
four different items xi,X2,X3,X4 to four specific (but not necessarily distinct) 
values 2/1,2/2,1/3,2/4 is That is, the assignment of items to hash values is 

four-wise independent. Note further that a function like h is defined using 0(1) 
parameters and can be evaluated in 0(1) time; hence, such functions can be used 
in a concise testing algorithm. 

3. Choose parameters 03, 02, cti, and b to define a random hash function h, as defined 
above. Apply the deferral algorithm to each bucket, where each ?/ G [0,m — 1] defines 
a bucket that contains every item x from the set of items, S such that h{x) = y. 

Comment: We assume that there is enough state information associated with each of the 
items for an item to "know" that it is no longer in 5*, so that we can express the 
test sets for each iteration using 0(l)-sized expressions. 

4. Remove from 5* each item x that is in a tainted set Ty, taking note of the item z 
identified as defective by the test for Ty (recall that we assume the test for such a Ty 
is an identifying test), and decrement d by the number of tainted sets. 

5. If = after removing all such items, then the algorithm has succeeded and is done. 
Otherwise, repeat the above steps for the remaining items in S. 

Let T{d) denote the expected number of tests performed by the above algorithm, and 
let X be the number of defective items that are not detected in the first iteration of our 
algorithm (i.e., they belong to impure test sets). Since there are d defective items and 2d 
possible tests, E{X) < d/2. Thus, by the linearity of expectation, T{d) satisfies the following 
recurrence equation: 

T{d) < T{d/2) + 2d, 

which implies that T{d) is at most 4d. In fact, we can prove that the number of tests used 
by the above algorithm is 0{d) with high probability. 

Let us begin our justification of this fact by analyzing a single iteration of our algorithm. 
Let d denote the number of defectives present at the beginning of an iteration i, and say 
that iteration i is good if the number of defective items detected in iteration i is at least d/4. 
Otherwise, iteration i is bad. 
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Lemma 1 The probability that an iteration i is bad is at most min{2/3, TO/c?^}. 



Proof: Similar to our usage above, let X be the number of items that are not detected in the 
given iteration i and note that E{X) < d/2. Then, by Markov's inequality (e.g., see [25]). 

, , , . E(X) d/2 2 
Pr X > 3/4 d < ^ < — = -. 

Note that X can be written as X = Xi + ■ ■ ■ X^ to be a random variable defined as the 
sum of m 4-wise independent 0/1 random variables, with expected value = E{X) < d/2 
and variance o"^ < (i/4, since X is Binomial and each Xi is 1 with probability at most 1/2. 
By an inequality due to Schmidt et al. [26], which requires that the variables defining X be 
4-wise independent, 

( 40-2 V 

Pr(X>3rf/4) = Pr(X-rf/2 > rf/4) < Pr(|X-/i| > rf/4) < 2 



,e(d/4) 



2 



/ d \^ 512 70 

< 2 . _ = — ^ < 



,e(d/4)V e^(P (P 
Combining the above bounds proves the lemma. ■ 

Thus, if there are d defectives remaining at the beginning of an iteration z, then with high 
probability there will be at most (3/4)^ defectives remaining after the iteration completes. 

Theorem 3 Given a set of n items with d defectives, the number of tests performed by the 
reduced-randomness ternary-result group testing algorithm is 0{d) with probability l—0{l/d). 

Proof: For the sake of the analysis, we divide the iterations of the randomized ternary-result 
group testing algorithm into two phases: 

• Phase 1: Each iteration i such that i < \og^j2,\ogd and the number of undetected 
defectives is more than d/ \ogd at the beginning of the iteration 

• Phase 2: The remaining iterations. 

Let us analyze each phase separately. By Lemma [H the probability that a particular iteration 
i in Phase 1 is bad is at most min{2/3, 70 log d/d^}. Thus, the probability that any iteration 
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in Phase 1 is bad is at most min{(2/3) log4/3 ^^g d, 70 log dlog^i^^ logrf/d }, which is 0(1/ d). 
That is, with high probability, all the iterations in Phase 1 are good, which implies that, 
with at least the same probability, the number of tests performed in Phase 1 is 0{d) and 
the number of defectives remaining at the beginning of Phase 2 is at most d/logd. So, let 
us assume that this many defectives remain at the beginning of Phase 2. 

Unfortunately, we cannot claim with high probability that all the iterations in Phase 2 
are good. But we do know that once we have g = log4/3 ^/ ^ogd good iterations in Phase 2, 
we are done. Let Z denote the number of iterations we make in Phase 2. By Lemma [H 
E{Z) < 3g. Note further that whether any given iteration is good is independent of whether 
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any other iteration is good (since we use a different random hash function, h, for each 
iteration). Thus, we can use a Chernoff bound (e.g., see [23]) to show that 

Ft{Z > 8g) = Ft{Z > (1 + 5/3)3^) < Pr(Z > (1 + 5/3)E{Z)) < 2-^'-^^ < 2'^ < {\ogd/df. 

Thus, Phase 2 uses at most 0{\ogd/ \ogd) iterations, with probabihty at least 1 — 0{l/d). 
Since we are assuming that each iteration in Phase 2 will require at most d/\ogd tests 
(which we have already shown to hold with high probability), this implies that Phase 2 will 
require 0{d) tests with high probability, that is, with probability that is at least 1 — 0{l/d). 
Combining this bound with the bound for Phase 1 completes the proof. ■ 

Thus, assuming we know the value of d, then the randomized ternary-result group testing 
algorithm uses 0{d) tests, with high probability. This fact can itself be used to estimate 
c?, of course, by a simple doubling strategy. We start such a strategy by assuming that the 
number of defectives is at most d = 2 and we run our algorithm based on this assumption, 
except that we stop the algorithm short if it uses more than cd tests, where c is the constant 
"hiding" behind the big-Oh in the high-probability bound on the number of tests needed. 
We then double the value of d and repeat. Since we double the number of tests with each 
round, we will quickly come to a round that succeeds within the claimed number of tests. 
Moreover, since we double the number of tests we perform with each round, this implies 
that the total number of tests is 0{d) with high probability. Therefore, we can achieve this 
bound, with high probability, even without knowing the value of d in advance. Of course, 
there is a trivial lower bound of Q{d) tests for any ternary-result group testing algorithm 
with identifying tests in the tainted case, so our performance bound is within a constant 
factor of optimality, with high probability. 

5 Our Anonymous Algorithm 

In this section, we discuss an efficient concise deterministic ternary-result group testing 
algorithm for the case in which a test of a tainted set does not identify the defective item. 
Consider algorithm AN{S), shown in Figs. [1] and [2l 

Subroutine Reduce reduces the original problem to one of identifying the d defective items 
in a collection L of d tainted subsets. Note that Reduce is essentially our earlier Identify 
algorithm, in which testing a tainted set immediately identified the defective item. Here, 
we require additional testing to identify the defective item. When d = 2, subroutine Final2 
iterates reducing the size of the two sets in L until they are singletons. When (i > 3, 
subroutine Final3 iterates reducing the size of three of the sets in L until at most two of 
the d sets are non-singleton, and then utilizes either Final2 or binary search to reduce the 
remaining set(s) to become singleton(s). 

All subsets can be selected so that the items of each subset have ID value ranks that 
are contiguous. All tests involve the union of at most three subsets, each of which can be 
specified as consisting of items whose ID values are in a specified range. Thus, algorithm 
AN is concise. 
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Algorithm AN ( S ) 

/ / Given: set S of items 

/ / Return: identity of all defective items 
if test{S) < 1 then identify the defective via binary search and exit 
list L ^ 
Reduce{S) 

if list L has only 2 sets, A and B then Final2{A, B) 
else Final3{L) 

Subroutine Reduce { S ) 

1 1 Given: set S of items that includes at least 2 defective items 
/ / Return: list L of disjoint subsets of S that each contain one defective item 
P2 ^ 0.38196601 

Partition S into two subsets, A and where |^| = P2|S'| 
t\ ^ test{A) 

if ti > 2 then Reduce{A) 

if ti = 1 then add A to list L 

if = then t2 ^ 2 

else t2 ^ test{B) 
if t2 > 2 then Reduce{B) 
if t2 = 1 then add B to list L 

Subroutine Final2 { A,B ) 

1 1 Given: two disjoint tainted sets 
/ / Return: identity of the 2 defective items 
P3 ^ 0.3176722 // 93 = (1 - P3) 

while 1^1 > 1 and |S| > 1 // Start with sets {A,B) having sizes {x,y) 
Partition A into Ai and A2, where \Ai\ = p3\A\ 
Partition B into Bi and B2, where |-Bi| =p3|B| 
ti ^ test{Ai U Bi) 

if ii = then {A, B) ^ {A2, B2) // RO: sizes [q^x, qsy), 1 test 

else if ti = I then 

t2 ^ test{Ai) 

if i2 = then {A, B) ^ {A2,Bi) // Rl: sizes {qzx,p2,y), 2 tests 
else {^jB) ^ {Ai,B2) / / Rl: sizes {p3X,qsy), 2 tests 

else /* {ti = 2) */ {A,B) ^ {Ai,Bi) // R2: sizes {p3X,p3y), 1 test 
use binary search to identify defectives in the (at most 1) set of A and B whose size > 1 



Figure 1: Analysis algorithm using anonymous ternary tests 
5.1 Correctness of FinalS^s Reduction Process 

Finals partitions each of three tainted non-singleton sets into two subsets - a relatively small 

subset and a relatively large subset - and makes the determination as to which three of the 
six newly created subsets are tainted. It first tests the union of the three smaller subsets. If 
this union is pure then the three larger subsets are tainted. Otherwise, one or two further 
tests suffice to make the determination, as shown in the following lemmata. 
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Subroutine FinalS ( L ) 

1 1 Given: list L of d > 3 disjoint tainted sets 
/ / Return: identity of the d defective items 
Pi ^ 0.27550804 // 94 = (1 - Pa) 

while 3 at least three non-singleton sets in L 

{a, b, c) <— indices of the largest three non-singleton sets in L 

11 Start with sets {La, Li„ Lc) having sizes {x,y,z) 
Partition La into Ai and A2, where = P4\La\ 
Partition Lf, into Bi and B2, where \Bi\ = P4,\Liy\ 
Partition Lc into Ci and C2, where |Ci| = Pa\Lc\ 
ti ^ test{Ai U J5i U Ci) 
if ti = then 
else if = 1 then 

t2 ^ test{Ai U B2) 
if f2 = then 
else if ^2 = 1 then 
else /* {t2 = 2) */ 
else II {t\ = 2) 

t2 ^ test{Ax U B2) 
if ^2 = then 
else if ^2 = 1 then 
is ^ iesi(Ci) 
if is = then 
else 

else /* {t2 = 2) */ 
if 3 two non-singleton sets {A and B) in L then Final2{A, B) 

else if 3 one non-singleton set, A, in L then identify A's defective by using binary search 



Figure 2: Final subroutine when d > 3 

Lemma 2 // we are given six sets, (Ai, A2, -Bi, -B2, Ci, (^2); such that each pair of sets 
{Xi,X2) consists of one pure and one tainted set, and that exactly one of the suhl sets 
{Ai, Bi^Ci) is tainted, then one ternary test suffices to identify which three sets are tainted. 

Proof: Test the union of sets Ai and B2. 

If the test shows that the union is pure then B2 is pure, and so Bi is the tainted subl 
set. Consequently, the tainted sets are {A2, Bi, C2). 

If the test shows that the union is tainted then either (1) Ai is tainted and B2 is pure, 
implying that B\ is also tainted which is inconsistent with the lemma's hypothesis that there 
is only one tainted subl set, or (2) Ai is pure and B2 is tainted, implying that Bi is pure 
and thus that Ci is the one tainted subl set. Consequently, the tainted sets are {A2, B2, 
C,). 

Finally, if the test shows that the union is impure then both Ai and B2 are tainted, and 
so Ai is the tainted subl set. Consequently, the tainted sets are (Ai, B2, C2). ■ 



{La,Li„Lc) {A2,B2,C2) II RO: sizes {q4X,q4y,q4z), 1 test 

{La,Lb,Lc) ^ {A2,Bi,C2) II Rl: sizes {q4X,p4y,q4z), 2 tests 

{La,Lij,Lc) ^ {A2. B2,Ci) II Rl: sizes {q4X,q4y,p4z), 2 tests 

{La,Lb,Lc) ^ (^1,^2, C2) // Rl: sizes {p4X,q4y,q4z), 2 tests 

{La,Lb,Lc) ^ {A2,Bi,Ci) II R2: sizes {q4X,p4y,p4z), 2 tests 



{La,Lb,Lc) ^ {Ai,Bi,C2) II R2: sizes {p4X , p4y , q4z) , 3 tests 
{La,Lb,Lc) ^ {Ai,Bi,Ci) II R3: sizes {p4X,p4y,p4z), 3 tests 
{La,Lb,Lc) ^ {Ai,B2,Ci) II R2: sizes {p4X,q4y,p4z), 2 tests 
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Lemma 3 // we are given six sets, (Ai, A2, 5i, C*!, ^^2)7 such that each pair of sets 

{Xi,X2) consists of one pure and one tainted set, and that at least two of the subl sets 
{Ai, Bi, Ci) are tainted, then two ternary tests suffice to identify which three sets are tainted. 

Proof: Test the union of sets Ai and B2. 

If the test shows that the union is pure, then Ai is pure and so the other two subl sets 
must be tainted. Consequently, the tainted sets are {A2, Bi, Ci). 

If the test shows that the union is tainted, then either (1) Ai is pure and B2 is tainted, 
implying that Bi is also pure which is inconsistent with the lemma's hypothesis that there 
are at least two tainted subl sets, or (2) Ai is tainted and B2 is pure, implying that Bi is 
also tainted. In this case, testing Ci will indicate either that Ci is pure, in which case the 
tainted sets are (^li, Bi, C2), or that Ci is tainted, in which case the tainted sets are {Ai, 
Bi, Ci). 

Finally, if the test shows that the union is impure, then B2 is tainted and so Bi is pure. 
We can conclude that the other two subl sets must be tainted. Consequently, the tainted 
sets are (Ai, B2, Ci). ■ 



5.2 Analysis of Algorithm AN 

Let Wdiji), for > 1, be the worst-case numbers of tests made by AN{S) when IS"! = n and 
there turns out to be d defectives. 

Theorem 4 

W2{n) < 1.87561gn + o(lgn) 
and, for d>3, Wd{n) < (0.3307 + 0.7202d) Ign + o(lgn). 

Proof: We analyze Final2 to evaluate H^2(w), and then analyze Finals to evaluate FFd(n), 
for > 3. 

Analysis of Final2 We make use of the real root of the equation p^ — {l— pa)^, which is 
solved by 

3/793 1 3/V93 1 

and of g3 = (1 — ^3) ^ 0.6723278. We assign counts of tests performed in Final2 as follows. 
Tests result in size reductions of the two sets, and we assign such tests to those sets in 
proportion to the logarithms of the ratios of the before and after set sizes. For example, an 
Rl scenario uses two tests to reduce sets of sizes x and y to sizes p^x and g'sy, and so we 
assign the size x set a count of 21gp3/(lgp3 + Iggs) tests. We define the normalized cost to 
be the assigned count divided by the logarithm of the size reduction. Accordingly, both sets 
would have the same normalized cost. For example, in an Rl scenario, both sets would have 
a normalized cost of —2/ (Igpa + Ig^a). 

Let Wi{n) be the worst-case number of tests made within Final2 that are assigned to a 
set, having initial size n, to identify the defective item in that set. A set's defective item will 
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be identified when that set has been reduced to size 1. Therefore, Wi{n) is the product of 
Ign and the maximum normahzed cost, C2- 

There are three scenarios of size reductions of the pair of sets. RO: 1 test reduces both 
set sizes by a factor of l/gs, each set is assigned a count of | test, and the normahzed costs 
are — l/(21gg3) ^ 0.9067. Rl: 2 tests reduce set sizes by factors of l/ps and l/gs, the sets 
are assigned counts 21gp3/(lgp3 + Iggs) and 21gg3/(lgp3 + lgg3) tests, and the normahzed 
costs are — 2/(lgp3 + lgg3) ~ 0.9067. R2: 1 test reduces both set sizes by a factor of l/ps, 
each set is assigned a count of | test, and the normahzed costs are — l/(21gp3) ^ 0.3022. 

Therefore, Ca ^ 0.9067 and Wi{n) ^ 0.9067 Ign. 

However, to the extent that a set uses binary search at the last fine of Final2, it does not 
participate in size reductions of the i?'s. In the worst case, scenario Rl recurs with one set's 
size consistently being reduced by ps and the other's size by ^3. This can occur — Ign/ Igp^ 
times, at which point one set will have size 1 and the other will have size n^~^^is/^sp^ . This 
second set will then require (1 — Ig ^3/ Igps) Ign tests, for a total of [1 — (2+lgg3)/ Igps] Ign ^ 
1.8756 Ign tests. Therefore, using Wi{n) ^ .9067 Ign leaves an undercount of at most 
[1.8756 - 2(0.9067)] Ign ^ 0.0622 Ign. 

We note that the problem of finding two defectives can be solved using about 1.4 Ign 
tests (see [I7j and [18]) but these other methods are much more complicated and may not 
admit to concise implementation. 



Evaluation of W2{n) We have the following recurrence. 

r 2 + 1^1(^2^) + Wi{p2n) + 0.0622 Ign 
W2in) = max I 2 + ^2(^2^) (13) 
[ 1 + W2{p2n) 

Consider W2{n) = xlgn + o(lgn). 

We use C2 ~ 0.9067, defined above as the maximum normalized cost, giving Wi{n) ^ 
C2lgn (excluding the undercount). 

If the first term of the recurrence were to be the maximum term, then x = 2c2 > 1.81. 
If the second term were to be the maximum term, then x = —2/\gp2 ~ 1.44. If the third 
term were to be the maximum term, then a; = — l/lgg2 ~ 1.44. Thus, the first term is the 
maximum term and W2{n) ~ (2c2 + 0.0622) Ign 1.8756 Ign. 



Analysis of Finals We make use of the real root with value less than one of the equation 
Pi = (1 — PiY, which is solved by 



3/7849 1 3/V849 1 

^ = V— + 2-V — -2 



P4 = l + i^^-i,/4=-^*''-27550804, (15) 

2 2 y \/ X 



and of g4 = (1 - Pi) ~ 0.72449196. 

Tests result in size reductions of three sets. Our analysis is similar to that of Final2. We 
assign tests to the three involved sets in proportion to the logarithms of the ratios of the 



21 



before and after set sizes. We define the normalized cost to be the assigned count divided 
by the logarithm of the size reduction. All three involved sets will have the same normalized 
cost. 

Let Wi{n) be the worst-case number of tests made within Finals that are assigned to an 
L-set, having initial size n, to identify the defective item in that set. An L-set's defective 
item will be identified when that set has been reduced to size 1. Therefore, Wi{ri) is the 
product of Ign and the maximum normalized cost, C3. 

There arc four scenarios of size reductions of a triple of sets. RO: 1 test reduces three 
set sizes by a factor of l/q^, and the normalized costs are — l/(31gg4) ~ 0.7169. Rl: 2 tests 
reduce one set size by a factor of l/p4 and the other two set sizes by a factor of 1/^4, and the 
normalized costs are — 2/(lgp4-|-21gq'4) 0.7169. R2: 2 or 3 tests reduce two of the set sizes 
by a factor of l/p4 and the other set size by a factor of 1/^4, and the normalized costs are 
either -2/(2 lgp4 + lgg4) ^ 0.4779, or -3/(2 lgp4 + lgg4) ^ 0.7169. R3: 3 tests reduce all 
three of the set sizes by a factor of l/p4, and the normalized costs are — 3/(31gp4) ~ 0.5376. 

Therefore, C3 0.7169 and Wi{n) ^ 0.7169 Ign. 

However, to the extent that a set uses Final2 or binary search at the last lines of FinalS, 
it does not participate in size reductions of the i?'s. In the worst case, scenario Rl recurs 
with two sets' sizes consistently being reduced by ^4 and the other set size by p^. This can 
occur — lgn/lgp4 times, at which point the two sets will have size 7T,i~ig94/igp4 _ ^0.75 ^^^^ 
the one set will have size 1. These two sets will then be reduced by Final2, requiring at most 
(1.8756) (.75) Ign = 1.406 Ign tests, for a total of (1.406 - 2/\gp^)\gn ^ 2.48141gn tests. 
Then. dWi{n) represents an undercount of the worst-case total number of tests in FinalS by 
at most [2.4814 - 3(.7169)] Ign - 0.3307 Ign. 



Evaluation of Wd{n) Define W^{n) to be Wd{n) minus the undercount of dWi{n) in FinalS 
which is at most 0.3307 Ign. Then, for d > 2, we have the following recurrence. 



Kin) 



max < 



2 + Wlip^n) + Wi 
2 + W:iip2n) 
1 + W^dfen) 



i{Q2n), 



for 1< i < d - 1 



(16) 



Consider W^{n) = xlgn + o(lgn), and we shall solve for x. Assume that, for 2 < i < d, 
Wl{n) = xilgn + o(lgn), where a; > C3 ~ 0.7169, and that W^2(n) = 2c2lgn + o(lgn), where 
C2 ~ 0.7202 is obtained from the solution to W2{d) = max{2 + 1^2(^2^), 1 + 1^2 (^s^)}- 

If the first term of the recurrence were to be the maximum term, then x > dc^ > 2.15, 
since d > 3. If the second term were to be the maximum term, then x = —2/\gp2 ~ 1.44. 
If the third term were to be the maximum term, then a; = — l/lgg2 ~ 1.44. Therefore, the 
first term is the maximum term and we obtain Wd{n) < (0.3307 + 0.7202(i) Ign + o(lgn). ■ 



6 Using Counting Queries 

In this section, we discuss a variant of our testing algorithm for the case when the queries 
provide an exact count of the number of defectives in a test set, and the result in the case of a 
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1-result identifies the defective item in the test set. As we show, the expected performance of 
this algorithm is significantly better than that of the ternary-result group testing algorithm. 

We apply an initial spreading action to distribute items across a set of buckets and we 
then perform a test for each bucket. The main difference is in the binary tree algorithm we 
then apply to each bucket B whose test indicates it has t >2 defective items: 

1. We set a partition factor, p, according to the analysis, and we split B into subsets Bi 
and B2 so that Bi has p\B\ items from B and B2 has the remaining items. 

2. We perform a test for Bi and, if the number, ti, of defective items in Bi is at least 
two, then we recursively search in Bi. 

3. If the (possibly recursive) testing of Bi has revealed all t defective items from B, then 
we skip the testing of B2, for it contains no defective items in this case. 

4. Otherwise, if the test for Bi revealed ti — t — 1 defectives, then we immediately test 
B2 to identify its one defective item. 

5. If, on the other hand, the test for Bi revealed ti defectives, with < ti < t — 1, then 
we recursively search in B2 (without performing a global test for B2, since we know it 
must have at least 2 defectives). 

Note that no deferral is needed in this algorithm, because we can always infer whether or 
not testing in a B2 set will be profitable. 

6.1 Analysis of the Counting Algorithm 

We begin by analyzing the expected number of tests of the counting binary tree algorithm, 
without performing any spreading action. Our analysis parallels the one we performed for 

Deferral. 

Let Ed be the expected number of tests performed by the binary tree counting algorithm 
(not counting the global test for all of the set), assuming there are d defectives. We evaluate 
Ed for small values of d. By construction, Eq — Ei — 0. li d — 2, then the 2-0 case causes 

1 test and a recursive call, the 1-1 case causes 2 tests, and the 0-2 case causes 1 test and a 
recursive call. Thus, letting q — 1 — p, 

E2 = p'^{E2 + l) + 2pq{2) + q^{E2 + l) = p^ E2 + {p^ + 2pq + q^) + 2pq + q'^ E2 
_ l + 2pq _ l + 2pq 

1 — p"^ — q^ 2pq 

Likewise, if d = 3, then the 3-0 case causes 1 test and a recursive call, the 2-1 case causes 

2 tests and a 2-defective recursive call, while the 1-2 case causes 1 test and a 2-defective 
recursive call, and the 0-3 case causes 1 test and a recursive call. Thus, 

^3 = p\E^ + 1) + ^p\{E2 + 2) + ^pq\E2 + 1) + q\E^ + 1) 
_ 1 3p2g -t- (Sp^g -t- 3pq'^)E2 
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Similarly, 

1 + 4p^q + {4p^q + 4:pq^)E2 + l2p^q^E2 
Ea — -, -. . 

1-^4-^4 

Likewise, 

1 + bp^q + {bp^q + bpq^)Ei + {IQp^q^ + lOp^gS) + £;3) 

-^5 = --, 1 1 ■ 

Finally (which will be sufficient for our analysis), 

_ l + 6p5g+(6p5g + 6pg5)^5 + (15pV + 15pV)(^2 + £^4) + 40pV^3 

6 - l-p6_^6 ■ 

These values can then be combined with an analysis (as given above) for bounding the 
number of buckets of various sizes to derive an expected bound on the number of tests 
performed by our algorithm. For example, if we choose a spread factor of s = 0.58 and a 
split parameter p — .4715, then we find that E^ < 1.896d, which is significantly better than 
that obtained by the ternary-result group testing algorithm. 

7 Conclusion 

We have presented several concise algorithms for ternary-result group testing, including a 
randomized algorithm for the identifying case that uses 0{d) tests with high probability, a 
deterministic algorithm for the identifying case that uses 0{d\ogn) tests and a deterministic 
algorithm for the anonymous case that uses 0{d\ogn) tests. We leave as an open problem 
whether there is a deterministic group testing algorithm for the identifying case that uses 
0{d) tests. 
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